A Multilingual Method for Clause Splitting
نویسنده
چکیده
This paper addresses the clause splitting problem and proposes a multilingual method for detecting clause boundaries in unrestricted texts. The method combines language independent machine learning techniques with language specific rules in order to take the first step in building the hierarchical structure of sentences. The results of a machine learning algorithm, trained on an annotated corpus, are processed by a rule-based module which deals with clause boundaries not included in the learning process. Formal indicators of coordination and subordination, together with verb type information (finite or non-finite) are used for identifying clause boundaries. The method was evaluated on Romanian and English and the F-measure for clause start detection is 95% for Romanian and 92% for English.
منابع مشابه
Anaphora - Clause Annotation and Alignment Tool
The paper presents Anaphora – an OS and language independent tool for clause annotation and alignment, developed at the Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences. The tool supports automated sentence splitting and alignment and modes for manual monolingual annotation and multilingual alignment of sentences and clauses. Anaphora has ...
متن کاملChinese Event Descriptive Clause Splitting with Structured SVMs
Chinese event descriptive clause splitting is the task of splitting a complex Chinese sentence into several clauses. In this paper, we present a discriminative approach for Chinese event descriptive clause splitting task. By formulating the Chinese clause splitting task as a sequence labeling problem, we apply the structured SVMs model to Chinese clause splitting. Compared with other two baseli...
متن کاملConstraint Manipulation in SGGS
SGGS (Semantically-Guided Goal-Sensitive theorem proving) is a clausal theorem-proving method, with a seemingly rare combination of properties: it is first order, DPLL-style model based, semantically guided, goal sensitive, and proof confluent. SGGS works with constrained clauses, and uses a sequence of constrained clauses to represent a tentative model of the given set of clauses. A basic buil...
متن کاملA hybrid method for clause splitting in unrestricted English texts
It is important to know the structure of the sentence for many NLP tasks. In this paper we propose a hybrid method for clause splitting in unrestricted English texts which requires less human work than existing approaches. The results of a machine learning algorithm, trained on an annotated corpus, are processed by a shallow rule-based module in order to improve the accuracy of the method. The ...
متن کاملThe importance of annotated corpora for NLP: the cases of anaphora resolution and clause splitting
In this paper we present two applications that depend on annotated corpora for their implementation, evaluation and improvement. The first is an automatic anaphora resolution system. After describing the algorithm we discuss the importance of corpora for the tasks of evaluation and automatic scoring and the development of a coreferentially annotated corpus. We go on to look ahead at the role of...
متن کامل